HyDe: a Python Package for Genome-Scale Hybridization Detection.

نویسندگان

  • Paul D Blischak
  • Julia Chifman
  • Andrea D Wolfe
  • Laura S Kubatko
چکیده

The analysis of hybridization and gene flow among closely related taxa is a common goal for researchers studying speciation and phylogeography. Many methods for hybridization detection use simple site pattern frequencies from observed genomic data and compare them to null models that predict an absence of gene flow. The theory underlying the detection of hybridization using these site pattern probabilities exploits the relationship between the coalescent process for gene trees within population trees and the process of mutation along the branches of the gene trees. For certain models, site patterns are predicted to occur in equal frequency (i.e., their difference is 0), producing a set of functions called phylogenetic invariants. In this paper we introduce HyDe, a software package for detecting hybridization using phylogenetic invariants arising under the coalescent model with hybridization. HyDe is written in Python, and can be used interactively or through the command line using pre-packaged scripts. We demonstrate the use of HyDe on simulated data, as well as on two empirical data sets from the literature. We focus in particular on identifying individual hybrids within population samples and on distinguishing between hybrid speciation and gene flow. HyDe is freely available as an open source Python package under the GNU GPL v3 on both GitHub (https://github.com/pblischak/HyDe) and the Python Package Index (PyPI: https://pypi.python.org/pypi/phyde).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HyDe: a Python Package for Genome-Scale Hybridization

—The analysis of hybridization and gene flow among closely related taxa is a 13 common goal for researchers studying speciation and phylogeography in natural 14 populations. Many methods for hybridization detection use simple site pattern 15 frequencies from observed genomic data and compare them to null models that predict 16 an absence of gene flow. The theory underlying the detection of hybr...

متن کامل

pyMAP: a Python package for small and large scale analysis of Illumina 450k methylation platform

Summary: PyMAP is a native python module for analysis of 450k methylation platform and is freely available for public use. The package can be easily deployed to cloud platforms that support python scripting language for large-scale methylation studies. By implementing fast parsing functionality, this module can be used to analyze large scale methylation datasets. Additionally, command-line exec...

متن کامل

Mackinac: a bridge between ModelSEED and COBRApy to generate and analyze genome-scale metabolic models

Summary Reconstructing and analyzing a large number of genome-scale metabolic models is a fundamental part of the integrated study of microbial communities; however, two of the most widely used frameworks for building and analyzing models use different metabolic network representations. Here we describe Mackinac, a Python package that combines ModelSEED's ability to automatically reconstruct me...

متن کامل

pyGeno: A Python package for precision medicine and proteogenomics [version 1; referees: awaiting peer review]

pyGeno is a python package mainly intended for precision medicine applications that revolve around genomics and proteomics. It integrates reference sequences and annotations from Ensembl, genomic polymorphisms from the dbSNP database and data from next-gen sequencing into an easy to use, memory-efficient and fast framework, therefore allowing the user to easily explore subject-specific genomes ...

متن کامل

pyGeno: A Python package for precision medicine and proteogenomics

pyGeno is a Python package mainly intended for precision medicine applications that revolve around genomics and proteomics. It integrates reference sequences and annotations from Ensembl, genomic polymorphisms from the dbSNP database and data from next-gen sequencing into an easy to use, memory-efficient and fast framework, therefore allowing the user to easily explore subject-specific genomes ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Systematic biology

دوره   شماره 

صفحات  -

تاریخ انتشار 2018